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Abstract 

We prove an exponential probability tail inequality for positive semidefinite quadratic forms 
in a subgaussian random vector. The bound is analogous to one that holds when the vector has 
independent Gaussian entries. 

1 Introduction 

Suppose that x = (x±, . . . random vector. Let A G jj mxn De a fixed matrix. A natural 

quantity that arises in many settings is the quadratic form ||^4:r|| 2 = x T (A T A)x. Throughout \\v\\ 
denotes the Euclidean norm of a vector v, and ||M|| denotes the spectral (operator) norm of a 
matrix M. We are interested in how close ||^4x|| 2 is to its expectation. 

Consider the special case where independent standard Gaussian random variables. 

The following proposition provides an (upper) tail bound for ||^4a;|| 2 . 

Proposition 1. Let A G R mxn be a matrix, and let U := A T A. Let x = (x±, . . . ,x n ) be an 
isotropic multivariate Gaussian random vector with mean zero. For all t > 0, 



Pr 



|Ax|| 2 > tr(I7) + 2 v / tr(£ 2 )t + 2\\U\\t 



<e~ t . 



The proof, given in Appendix IA.21 is straightforward given the rotational invariance of the 
multivariate Ga ussian distribution, together with a tail bound for linear combinations of x 2 random 
variables due to rrent a.nd Ma,ssarT M . We not e that a sligh tly weaker form of Proposition [T] 



can be proved directly using Gaussian concentration ( Pisierl . 19891 ). 



In this note, we consider the case where x = (xi, . . . , x n ) is a subgaussian random vector. By 
this, we mean that there exists a a > 0, such that for all a G M n , 



E 



exp l a T x 



< exp (|H|V/2) . 



We provide a sharp upper tail bound for this case analogous to one that holds in the Gaussian case 
(indeed, the same as Proposition [T] when a = 1). 
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Tail inequalities for sums of random vectors 



One motivation for our main result comes from the following observations about sums of random 
vectors. Let a±, . . . , a n be vectors in a Euclidean space, and let A = [a\ \ ■ ■ ■ \a n ] be the matrix with 
di as its iih column. Consider the squared norm of the random sum 



\Ax\ 



E 

i=l 



(I / •! i 



(l) 



where x := (xi,...,x n ) is a martingale difference sequence with E[xj | X\, . . . , = and 

¥,[x 2 | x\, . . . , Xi-i] = a 2 . Under mild boundedness assumptions on the Xi, the probability that the 
squared norm in ([1]) is much larger than its expectation 

n 

E[|| Ar|| 2 ] = a 2 ^ Ikf = ^ tr(A T A) 

i=l 

falls off exponentially fast. This can be shown, for instance, using the following lemma by taking 
(the proof is standard, but we give it for completeness in Appendix IA.1|) . 

Proposition 2. Let U\, . . . , u n be a martingale difference vector sequence (i.e., E[uj|wi, . . . , Ui^\] = 
for all i = 1, . . . , n) such that 



En 



ui, . . . ,ui-i]< v and \\iii\\ < b 



for all i = 1, . . . , n, almost surely. For all t > 0, 



Pr 



E 

i=l 



> + y/Evi + (4/3)6t 



< e" 



After squaring the quantities in the stated probabilistic event, Proposition [2] gives the bound 
\\Ax\\ 2 < a 2 ■ tr(A T A) + a 2 ■ O ( tr(A T A)(Vt + t) + \/ ti(A T A) max \\ai\\(t + t 3/2 ) + max ||ai|| 2 i 

V V i i 

with probability at least 1 — e~ l when the Xi are almost surely bounded by 1 (or any constant). 

Unfortunately, this bound obtained from Proposition [2] can be suboptimal when the Xi are 
subgaussian. For instance, if the X{ are Rademacher random variables, so Pr[rrj = +1] = Pr[xj = 
— 1] = 1/2, then it is known that 



|Ac|| 2 < tv(A T A) + O ^tr((A T A) 2 )t + \\A\\ 2 tj 



(2) 



wit h probability at leas t 1 — e ~ f . A similar result holds for any subgaussian distribution on the 
Xi (jHanson and Wrightl . Il97ll ). This is an improvement over the previous bound because the 
deviation terms (i.e., those involving t) can be significantly smaller, especially for large t. 

In this work, we give a simple proof of ([2]) with explicit constants that match the analogous 
bound when the independent standard Gaussian random variables. 
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2 Positive semidefinite quadratic forms 

Our main theorem, given below, is a generalization of ([2]). 

Theorem 1. Let A £ ]^ mxn 5 e a matrix, and let U := A T A. Suppose that x = (x±, . . . ,x n ) is a 
random vector such that, for some \x £ W 1 and a > 0, 



E 



exp (a T {x - n))] < exp (||a|| V/2) 



(3) 



for all d£l n . For all t > 0, 



Pr 



\Axf > CT 2 .(tr(r)+2yM^+2||rp)+P 



2 \ 1/2 



< e" 



Remark 1. Note that when \jl = and a = 1 we have: 

Pr 



||Ac|| 2 > tr(Z0 + 2 x /tr(i7 2 )t + 2||27||< 



< e" 



which is the same as Proposition [TJ 

Remark 2. Our proof actually establishes the following upper bounds on the moment generating 
function of \\Ax\\ 2 for < rj < 1/(2ct 2 ||I7||): 



E [exp (r/|| Ae|| 2 )] <E exp (a 2 \\A T z\\ 2 i] + fi T A T z v / 2ri 

a A tT(E 2 )r, 2 + \\Afi\\ 2 r] 



<exp^x tr(^+ 1 _ 2oamv 
where z is a vector of m independent standard Gaussian random variables. 

Proof of Theorem [3 Let z be a vector of m independent standard Gaussian random variables 
(sampled independently of x). For any a G M m , 



E 



exp l z 1 a 



Thus, for any A € R and e > 0, 



E 



exp ( 



Moreover, 



E 



exp ( Xz T Ax 



> E 

> exp 

= E 

< E 



exp y\z T Ax 
\ 2 e 



exp(||a|| 2 /2). 

Ax\\ 2 > £ 
• Pr ri|v4x|| 2 >el . 



• Pr HAeII 2 > e 



(4) 



E 



exp y\z A{x — /x) 
A 2 a 2 



exp ( Xz J A[i 



exp 



A r z\\ 2 + Xfi 1 A 1 z 



T A T. 



(5) 
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Let USV T be a singular value decomposition of A; where U and V are, respectively, matrices of 
orthonormal left and right singular vectors; and S = diag(- v /pi, . . . , y/p m ) is the diagonal matrix of 
corresponding singular values. Note that 

m m 

||p||l = y^Pi= tr(17), HpIII = = tr(I7 2 ), and HpH^ = maxpj = ||I7||. 

i=l i=l 

By rotational invariance, y := C/ T z is an isotropic multivariate Gaussian random vector with mean 
zero. Therefore ||y4 T .z|| 2 = z T US 2 U T z = p\y 2 + - ■ ■ + p m ym an d p T A T z = v T y = v\y\ + - ■ •-\-VmVmi 
where v := SV T p (note that ||z/|| 2 = ||S'V T /i|| 2 = ||A/x|| 2 ). Let 7 := \ 2 a 2 /2. By LemmaHJ 



E 



ex p + -^rS 



i=l 



i=l 



< exp ||p||i7 + 



pjllr + MVg 

1 - 211^11007 



(6) 



for < 7 < l/(2||p||oo). Combining and (jfil gives 



Pr ||Ax|r > £ < exp -£7/0- + ||p||i7 + 



p\\h' 2 + [HIV^ 2 

1 - 2 ||p||oo7 



for < 7 < 1/(211/51100) and e > 0. Choosing 



e := (j^HpHi + r) + Hn/1 + 



2||p||ooT" 



|p|ll 



and 7 : = 



IpIII 



2||p| 



\p\\i + 2||p||ooT 



we have 



Pr 



\\Ax\\ 2 >a 2 (\\p\\ 1 +T) + \\v\\ 2 Jl + 



2JH|qqT 

IIpIII 



< exp 



exp 



ML 

np\\io 
iipin 

z IIPIIoo 



1 + 



\P\\oqT 

IIpIII 



1 + 



2jH|ooT 

IIpIII 



IIpIII 



where /ii(a) := 1 + a — y/l + 2a, which has the inverse function ^ *(&) = y/2b+b. The result follows 
by setting r := 2 v / ||p|||t + 2||p|| 0O t = 2y / tr(Z' 2 )t + 2||17||t. □ 

The following lemma is a standard estimate of the logarithmic moment generating function of a 
quadr atic form in standard Gaussia n random variables, proved much along the lines of the estimate 
due to Laurent and Massart ( 2000l ). 

Lemma 1. Let z be a vector of m independent standard Gaussian random variables. Fix any 
non-negative vector a £ and any vector f3 E M. m . If < A < 1/(21101100), then 



logE 



exp A ^2 a i z i + @ iZi 



i=i 



i=l 



< Met II 1 A + 



»\\ 2 2 x 2 + m\ 2 /2 ^ 

1 - 2||a||ooA 
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Proof. Fix AeR such that < A < l/(2||a|| 00 ), and let rji := - 2a^A > for z = 1, . . . , m. 

We have 

f°° 1 

E [exp (Xatzf + = / — = exp (-zf/2) exp {Xaizf + ftzj) efe; 

J-oo V 2.TT 

= * (^) L -j= (-4 ( * - M2)2 ) ^ 



so 



logE 



exp A ^2 aizf + ^2 A 



3/ 



1=1 



i=l 



i=l 



The right-hand side can be bounded using the inequalities 



\t^--\t^-^-\tt [2a,xv Mix2 



i=i 



i=l 



i=l j=l J 



< \\ahX + 



1 - 2 a LA 



and 



iy 3 2„ 2< 11^111/2 

i=i 



A' 



□ 



Example: fixed-design regression with subgaussian noise 

We give a simple application of Theorem Q] to fixed-design linear regression with the ordinary least 
squares estimator. 

Let xi,...,x n be fixed design vectors in M. d . Let the responses y\, . . . , y n be random variables 
for which there exists a > such that 



E 



exp ^ai{yi - E[yi\) 



a=l 



< exp a 2 ^ 



i=i 



for any a±, . . . , a n £l. This condition is satisfied, for instance, if 

yi = K[yi] + £i 

for independent subgaussian zero-mean noise variables e±, . . . ,e n . Let S := Y17=i x i x J / n > which 
we assume is invertible without loss of generality. Let 



be the coefficient vector of minimum expected squared error. The ordinary least squares estimator 
is given by 
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The excess loss R($) of /3 is the difference between the expected squared error of (3 and that of /3: 



R0) := E 



1 n 1 n 



8=1 



1=1 



It is easy to see that 



R0) = {{s^ip-flf = ^(r-^fe-EM) 



1=1 



By Theorem [TJ 



Pr 



cr 2 (d + 2\/rft + 2t) 



n 



Note that in the case that E[(yj — E[yj]) ] = cr for each i, then 



E[i?(/9)] 



a 2 d 



n 



so the tail inequality above is essentially tight when the yi are independent Gaussian random 
variables. 
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A Standard tail inequalities 

A.l Martingale tail inequalities 

The following is a standard form of Bernstein's inequality stated for martingale difference sequences. 

Lemma 2 (Bernstein's inequality for martingales). Let d\,...,d n be a martingale difference se- 
quence with respect to random variables x\, . . . ,x n (i.e., E[dj|aii, . . . , = for all i = 1, . . . , n) 
such that \di\ < b and Y17=l E[^f • • • , Xi-i] < v. For all t > 0, 



Pr 



J^di > V2M + (2/3)6t 



i=l 



< e 



The proof of Proposition [21 which is entirely standard, is an immediate consequence of the 
following two lemmas together with Jensen's inequality. 
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Lemma 3. Let u\, . . . ,u n be random vectors such that 



E [||iij|| 2 | ui, . . . , Ui-i] < v and \\ui\\ < b. 



t=l 



for all i = 1, . . . , n, almost surely. For all t > 0, 



Pr 



i=i 



> \/8ut+ (4/3)fa 



< e" 



Proof. Let s n := ui + • • • + u n . Define the Doob martingale 

di := E[||s n || | ui,...,Ui]- E[\\s n \\ \ ui, . . . 

for i = 1, . . . , n, so d± + • • ■ + d n = \\s n \\ — E[||s n ||]. First, clearly, E[dj|ui, . . . , iij-i] = 0. Next, the 
triangle inequality implies 

di = E [\\(s n - Ui) + Ui\\ | m, . ..,Ui] - E [||(s n - Uj) + | u\, . . . ,Ui-i] 

< E [\\s n - Ui\\ + \\ui\\ | Ml, ...,Ui] - E [||||s n - - | Ui,... ,Ui_l] 

= ||itj|| +E[||uj|| I ui,... ,Ui-i] , 
and similarly, di > — — E [||uj|| | u\, . . . ,Ui-\] . 

Therefore, 

< + E | u\, . . . , itj_i] < 2fe almost surely. 

Moreover, 

E [d 2 | Ui,... < E ||iij|| 2 + 2 • • E | m, ... 

+ E[||ui|| | ui, . . . ,Ui-i} 2 | ni,...,Uj_i 

= E [||tij|| 2 | u\,... ,Ui-i] + 3 • E [||uj|| | iti, . . . ,iii_i] 2 
<4-E[||u;|| 2 | ui,...,Ui-i] , 

n 

so ^^E [(if I u\, . . . , Uj_i] < 4v almost surely. 



i=l 



The claim now follows from Bernstein's inequality (Lemma [2]). 



□ 



Lemma 4. If u\, . . . ,u n is a martingale difference vector sequence (i.e., M[ui\ui, . . . ,114—1] = for 
all i = 1, . . . , n), then 



E 



i=l 



Proof. Let Sj := ui + • • • + iti for i = 1, . . . , n; we have 

E[||s n || 2 ] =E[E[||u n + s n _ 1 || 2 I ui,...,u n -i]] 



E 



E 



l^nll "I - 2lt n S n _i 

+ ll s n-l|| 2 I 



e[kh 2 i +En| Sn _ 1 || 2 i 



so the claim follows by induction. 



□ 



A. 2 Gaussian quadratic forms and x 2 tail inequalities 

It is well-known that if z ~ A/"(0, 1) is a standard Gaussian random varia ble, then z 2 follows a 



y dis tribution with one degree of freedom. The following inequality due to lLaurent and Massart 
(|2000l ) gives a bound on linear combinations of x 2 random variables. 



Lemma 5 (x 2 tail inequality; Laurent and Massart . 2000l ). Let q\, . . . , q n be independent x 2 random 
variables, each with one degree of freedom. For any vector 7 = (71, . . . ,j n ) G WL with non-negative 
entries, and any t > 0, 



Pr 



^7i9i > hill + 2 



\ 2 t + 2\ 



7l|oo* 



< 



Proof of Proposition [IJ Let ^A^ T be an eigen-decomposition of A T A, where V is a matrix of 
orthonormal eigenvectors, and A := diag(pi, . . . , p n ) is the diagonal matrix of corresponding eigen- 
values p\,--- , p n - By the rotational invariance of the distribution, z := V T x is an isotropic multi- 

z T Az = p\z\ + • • • + pnZ 2 ,, and the 



,Pn- 

variate Gaussian random vector with mean zero. Thus, \\Ax\ 



are independent x random variables, each with one degr ee of freedom. The cla i m no w follows 



from a tail bound for X 2 random variables (Lemma© due to lLaurent and M^rl B . □ 



S 



